Error Annotation of the Arabic Learner Corpus - A New Error Tagset

نویسندگان

  • Abdullah Alfaifi
  • Eric Atwell
  • Ghazi Abuhakema
چکیده

This paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ARIDA, created by Abuhakema et al. [1], and a number of other error-analysis studies. It was used to annotate texts of the Arabic Learner Corpus [2]. The paper shows the tagset broad classes and types or subcategories and an example of annotation. The understandability of AALETA was measured against that of ARIDA, and the preliminary results showed that AALETA achieved a slightly higher score. Annotators reported that they preferred using AALETA over ARIDA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating an Arabic Learner Corpus for Error

This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a Frenc...

متن کامل

ARIDA: An Arabic Interlanguage Database and Its Applications: A Pilot Study

This paper describes a pilot study in which we collected a small learner corpus of Arabic, developed a tagset for errorannotation of Arabic learner data, tagged the data for error, and performed simple Computer-aided Error Analysis (CEA). Language Learner Corpora and Applications Learner corpora research uses the methods and tools of Second Language Acquisition (SLA) studies and corpus linguist...

متن کامل

Error Annotation for Corpus of Japanese Learner English

In this paper, we discuss how error annotation for learner corpora should be done by explaining the state of the art of error tagging schemes in learner corpus research. Several learner corpora, including the NICT JLE (Japanese Learner English) Corpus that we have compiled are annotated with error tagsets designed by categorizing “likely” errors implied from the existing canonical grammar rules...

متن کامل

Grammatical Error Annotation for Korean Learners of Spoken English

The goal of our research is to build a grammatical error-tagged corpus for Korean learners of Spoken English dubbed Postech Learner Corpus. We collected raw story-telling speech from Korean university students. Transcription and annotation using the Cambridge Learner Corpus tagset were performed by six Korean annotators fluent in English. For the annotation of the corpus, we developed an annota...

متن کامل

The Overview of the SST Speech Corpus of Japanese Learner English and Evaluation Through the Experiment on Automatic Detection of Learners' Errors

This paper introduces an overview of the speech corpus of Japanese learner English compiled by National Institute of Information and Communications Technology by showing its data collection procedure and annotation schemes including error tagging. We have collected 1,200 interviews for three years. One of the most unique features of this corpus is that it contains rich information on learners’ ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013